152 PART 4 Comparing Groups
In this case, because the p value is greater than 0.05, equal variances can be
assumed, and these data would qualify for the classic Student t test. As described
earlier, R gets around this by always using the Welch’s t test, which accommo-
dates both unequal and equal variances.
Assessing the ANOVA
In this section, we present the basic concepts underlying the analysis of variance
(ANOVA), which compares the means of three or more groups. We also describe
some of the more popular post-hoc tests used to follow a statistically significant
ANOVA. Finally, we show you how to run commands to execute an ANOVA and
post-hoc tests in R, and interpret the output.
Grasping how the ANOVA works
As described earlier in “Surveying Student t tests,” it is only possible to run a t
test on two groups. This is why we demonstrated the t test comparing married
NHANES participants (M) to all other marital statuses (OTH). We were testing the
null hypothesis M – OTH = 0 because we were only allowed to compare two groups!
So when comparing three groups, such as married (M), never married (NM), and
all others (OTH), it’s natural to think of pairing up the groups and running three t
tests (meaning testing M – NM, then testing M – OTH, then testing NM – OTH). But
running an exhaustive set of two-group t tests increases the likelihood of Type I
error, which is where you get a statistically significant comparison that is just by
chance (for a review, read Chapter 3). And this is just with three groups!
The general rule is that N groups can be paired up in N N
1
2
/ different ways,
so in a study with six groups, you’d have 6
5 2
/ , or 15 two-group comparisons,
which is way too many.
The term one-way ANOVA refers to an ANOVA with only one grouping variable in
it. The grouping variable usually has three or more levels because if it has only
two, most analysts just do a t test. In an ANOVA, you are testing how spread out
the means of the various levels are from each other. It is not unusual for students
to be asked to calculate an ANOVA manually in a statistics class, but we skip that
here and just describe the result. One result derived from an ANOVA calculation is
expressed in a test statistic called the F ratio (designated simply as F). The F is the
ratio of how much variability there is between the groups relative to how much
variability there is within the groups. If the null hypothesis is true, and no true
difference exists between the groups (meaning the average fasting glucose in
M = NM = OTH), then the F ratio should be close to 1. Also, F’s sampling fluctua-
tions should follow the Fisher F distribution (see Chapter 24), which is actually a
family of distribution functions characterized by the following two numbers seen
in the ANOVA calculation: